IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units

نویسندگان

Stas Tiomkin

David Malah

Slava Shechtman

Zvi Kons

چکیده

Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in the stored data, and audible discontinuities may result. On the other hand, statistical TTS (STTS) systems, in spite of having a smaller footprint than CTTS, synthesize speech that is free of such discontinuities. Yet, in general, STTS produces lower quality speech than CTTS, in terms of naturalness, as it is often sounding muffled. The muffling effect is due to over-smoothing of model-generated speech features. In order to gain from the advantages of each of the two approaches, we propose in this work to combine CTTS and STTS into a hybrid TTS (HTTS) system. Each utterance representation in HTTS is constructed from natural segments and model generated segments in an interweaved fashion via a hybrid dynamic path algorithm. Reported listening tests demonstrate the validity of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Quality Preserving Compression of a Concatenative Text-To- Speech Acoustic Database

A Concatenative Text-To-Speech (CTTS) synthesizer requires a large acoustic database for high quality speech synthesis. This database consists of many acoustic leaves, each containing a number of short, compressed, speech segments. In this paper we propose two algorithms for re-compression of the acoustic database, by re-compressing the data in each acoustic leaf, without compromising the perce...

متن کامل

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Statistical Text-To-Speech Synthesis based on Segment- wise Representation with a Norm Constraint

In statistical HMM-based TTS systems (STTS), speech feature dynamics is modelled by firstand second-order feature frame differences, which, typically, do not satisfactorily represent frame to frame feature dynamics present in natural speech. The reduced dynamics results in over-smoothing of speech features, often sounding as muffled and buzzy synthesized speech. In this work we propose a method...

متن کامل

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Erasure/List Exponents for Slepian-Wolf Decoding

متن کامل

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Universal Decoding for Gaussian Intersymbol Interference Channels

متن کامل

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Gaussian beams scattered from different materials

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units

نویسندگان

چکیده

منابع مشابه

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Quality Preserving Compression of a Concatenative Text-To- Speech Acoustic Database

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Statistical Text-To-Speech Synthesis based on Segment- wise Representation with a Norm Constraint

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Erasure/List Exponents for Slepian-Wolf Decoding

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Universal Decoding for Gaussian Intersymbol Interference Channels

IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES Gaussian beams scattered from different materials

عنوان ژورنال:

اشتراک گذاری